Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking

نویسندگان

  • Péter Balázs
  • Bernhard Laback
  • Gerhard Eckel
  • Werner A. Deutsch
چکیده

We present an algorithm for removing timefrequency components, found by a standard Gabor transform, of a “real-world” sound while causing no audible difference to the original sound after resynthesis. Thus this representation is made sparser. The selection of removable components is based on a simple model of simultaneous masking in the auditory system. Important goals were the applicability to any realworld music and speech sound, integrating mutual masking effects between time-frequency components, coping with the time-frequency spread of such an operation, and computational efficiency. The proposed algorithm first determines an estimation of the masked threshold within an analysis window. The masked threshold function is then shifted in level by an amount determined experimentally, and all components falling below this function (the irrelevance threshold) are removed. This shift gives a conservative way to deal with uncertainty effects resulting from removing time-frequency components and with inaccuracies in the masking model. The removal of components is described as an adaptive Gabor multiplier. Thirty-six normal hearing subjects participated in an experiment to determine the maximum shift value for which they could not discriminate the irrelevance filtered signal from the original signal. On average across the test stimuli, 36 percent of the time-frequency components fell below the irrelevance threshold.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporation of temporal masking effects into bark spectral distortion measure

The objective of this paper is to extend a promising objective speech distortion measurement method, the Bark Spectral Distance (BSD) measure, with the auditory concepts of forward and backward temporal masking to improve its measurement accuracy. The results of this investigation show that automatic BSD-based speech quality ratings may be made to correlate better with existing MOS ratings by r...

متن کامل

Single channel speech enhancement by frequency domain constrained optimization and temporal masking

A speech enhancement algorithm is proposed that exploits the masking properties of the human auditory system. The enhancement is formulated as a frequency domain constrained optimization problem. The noise components of the noisy speech are suppressed by a gain function subject to the constraint that both the signal distortion and residual noise should fall below the masking thresholds. Tempora...

متن کامل

Perceptual irrelevancy removal in narrowband speech coding

A masking model originally designed for audio signals is applied to narrowband speech. The model is used to detect and remove the perceptually irrelevant simultaneously masked frequency components of a speech signal. Objective measurements have shown that the modified speech signal can be coded more efficiently than the original signal. Furthermore, it has been confirmed through perceptual eval...

متن کامل

Interactions of forward and simultaneous masking in intensity discrimination.

Intensity coding mechanisms are explored in a paradigm involving both forward and simultaneous masking. For intensity discrimination of 1000-Hz pure tone in quiet, a near-miss to Weber's law is observed. However, as more stimulus components are added to this relatively simple experiment, interactions among components produce a more complex pattern of results. An intense forward masker, while no...

متن کامل

Sinusoidal modeling using frame-based perceptually weighted matching pursuits

We propose a method for sinusoidal modeling that takes into account the psychoacoustics of human hearing using a frame-based perceptually weighted matching pursuit. Working on blocks of the input signal, a set of sinusoidal components for each block is iteratively extracted taking into consideration perceptual significance by using extensions to the well known matching pursuits algorithm. These...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Audio, Speech & Language Processing

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2010